-
Notifications
You must be signed in to change notification settings - Fork 464
fix(llmobs): ensure langchain azure openai spans are not duplicate llm marked #14939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks for the quick fix!
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 242 ± 6 ms. The average import time from base is: 242 ± 3 ms. The import time difference between this PR and base is: 0.1 ± 0.2 ms. The difference is not statistically significant (z = 0.44). Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate yunkim/langchain-aoai (004a7db) with baseline main (18a5e9a) 📈 Performance Regressions (1 suite)📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 4.376µs (SLO: <10.000µs 📉 -56.2%) vs baseline: +1.3% Memory: ✅ 37.729MB (SLO: <39.000MB -3.3%) vs baseline: +4.9% ✅ ospathbasename_noaspectTime: ✅ 1.090µs (SLO: <10.000µs 📉 -89.1%) vs baseline: -0.3% Memory: ✅ 37.749MB (SLO: <39.000MB -3.2%) vs baseline: +5.2% ✅ ospathjoin_aspectTime: ✅ 7.143µs (SLO: <10.000µs 📉 -28.6%) vs baseline: 📈 +14.5% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +5.0% ✅ ospathjoin_noaspectTime: ✅ 2.316µs (SLO: <10.000µs 📉 -76.8%) vs baseline: +0.5% Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +4.7% ✅ ospathnormcase_aspectTime: ✅ 3.575µs (SLO: <10.000µs 📉 -64.3%) vs baseline: +0.6% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +5.0% ✅ ospathnormcase_noaspectTime: ✅ 0.572µs (SLO: <10.000µs 📉 -94.3%) vs baseline: -0.7% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ ospathsplit_aspectTime: ✅ 5.007µs (SLO: <10.000µs 📉 -49.9%) vs baseline: +1.4% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.7% ✅ ospathsplit_noaspectTime: ✅ 1.597µs (SLO: <10.000µs 📉 -84.0%) vs baseline: +0.5% Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +5.0% ✅ ospathsplitdrive_aspectTime: ✅ 3.779µs (SLO: <10.000µs 📉 -62.2%) vs baseline: +0.8% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.8% ✅ ospathsplitdrive_noaspectTime: ✅ 0.698µs (SLO: <10.000µs 📉 -93.0%) vs baseline: -1.5% Memory: ✅ 37.729MB (SLO: <39.000MB -3.3%) vs baseline: +5.0% ✅ ospathsplitext_aspectTime: ✅ 5.290µs (SLO: <10.000µs 📉 -47.1%) vs baseline: 📈 +15.4% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ ospathsplitext_noaspectTime: ✅ 1.388µs (SLO: <10.000µs 📉 -86.1%) vs baseline: +0.2% Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +5.0% 🟡 Near SLO Breach (4 suites)🟡 djangosimple - 30/30✅ appsecTime: ✅ 20.405ms (SLO: <22.300ms -8.5%) vs baseline: -0.1% Memory: ✅ 65.475MB (SLO: <67.000MB -2.3%) vs baseline: +4.9% ✅ exception-replay-enabledTime: ✅ 1.347ms (SLO: <1.450ms -7.1%) vs baseline: -0.6% Memory: ✅ 64.508MB (SLO: <67.000MB -3.7%) vs baseline: +4.6% ✅ iastTime: ✅ 20.362ms (SLO: <22.250ms -8.5%) vs baseline: -0.3% Memory: ✅ 65.415MB (SLO: <67.000MB -2.4%) vs baseline: +4.7% ✅ profilerTime: ✅ 15.257ms (SLO: <16.550ms -7.8%) vs baseline: +0.6% Memory: ✅ 53.713MB (SLO: <54.500MB 🟡 -1.4%) vs baseline: +4.9% ✅ resource-renamingTime: ✅ 20.498ms (SLO: <21.750ms -5.8%) vs baseline: -0.3% Memory: ✅ 65.515MB (SLO: <67.000MB -2.2%) vs baseline: +4.9% ✅ span-code-originTime: ✅ 25.362ms (SLO: <28.200ms 📉 -10.1%) vs baseline: +0.3% Memory: ✅ 67.604MB (SLO: <69.500MB -2.7%) vs baseline: +3.3% ✅ tracerTime: ✅ 20.494ms (SLO: <21.750ms -5.8%) vs baseline: ~same Memory: ✅ 65.445MB (SLO: <67.000MB -2.3%) vs baseline: +4.8% ✅ tracer-and-profilerTime: ✅ 21.976ms (SLO: <23.500ms -6.5%) vs baseline: -0.2% Memory: ✅ 66.650MB (SLO: <67.500MB 🟡 -1.3%) vs baseline: +4.8% ✅ tracer-dont-create-db-spansTime: ✅ 19.265ms (SLO: <21.500ms 📉 -10.4%) vs baseline: -0.3% Memory: ✅ 65.402MB (SLO: <66.000MB 🟡 -0.9%) vs baseline: +4.7% ✅ tracer-minimalTime: ✅ 16.564ms (SLO: <17.500ms -5.3%) vs baseline: -0.5% Memory: ✅ 65.419MB (SLO: <66.000MB 🟡 -0.9%) vs baseline: +4.7% ✅ tracer-nativeTime: ✅ 20.389ms (SLO: <21.750ms -6.3%) vs baseline: -0.5% Memory: ✅ 71.465MB (SLO: <72.500MB 🟡 -1.4%) vs baseline: +4.9% ✅ tracer-no-cachesTime: ✅ 18.366ms (SLO: <19.650ms -6.5%) vs baseline: -0.8% Memory: ✅ 65.315MB (SLO: <67.000MB -2.5%) vs baseline: +4.6% ✅ tracer-no-databasesTime: ✅ 18.764ms (SLO: <20.100ms -6.6%) vs baseline: ~same Memory: ✅ 65.362MB (SLO: <67.000MB -2.4%) vs baseline: +4.7% ✅ tracer-no-middlewareTime: ✅ 20.168ms (SLO: <21.500ms -6.2%) vs baseline: +0.2% Memory: ✅ 65.326MB (SLO: <67.000MB -2.5%) vs baseline: +4.5% ✅ tracer-no-templatesTime: ✅ 20.227ms (SLO: <22.000ms -8.1%) vs baseline: -0.7% Memory: ✅ 65.373MB (SLO: <67.000MB -2.4%) vs baseline: +4.8% 🟡 errortrackingdjangosimple - 6/6✅ errortracking-enabled-allTime: ✅ 18.069ms (SLO: <19.850ms -9.0%) vs baseline: ~same Memory: ✅ 65.274MB (SLO: <66.500MB 🟡 -1.8%) vs baseline: +4.9% ✅ errortracking-enabled-userTime: ✅ 18.036ms (SLO: <19.400ms -7.0%) vs baseline: -0.2% Memory: ✅ 65.254MB (SLO: <66.500MB 🟡 -1.9%) vs baseline: +4.8% ✅ tracer-enabledTime: ✅ 18.065ms (SLO: <19.450ms -7.1%) vs baseline: +0.3% Memory: ✅ 65.313MB (SLO: <66.500MB 🟡 -1.8%) vs baseline: +5.0% 🟡 flasksimple - 18/18✅ appsec-getTime: ✅ 4.585ms (SLO: <4.750ms -3.5%) vs baseline: -0.2% Memory: ✅ 61.991MB (SLO: <65.000MB -4.6%) vs baseline: +4.8% ✅ appsec-postTime: ✅ 6.606ms (SLO: <6.750ms -2.1%) vs baseline: ~same Memory: ✅ 61.971MB (SLO: <65.000MB -4.7%) vs baseline: +4.8% ✅ appsec-telemetryTime: ✅ 4.576ms (SLO: <4.750ms -3.7%) vs baseline: -0.4% Memory: ✅ 62.030MB (SLO: <65.000MB -4.6%) vs baseline: +4.9% ✅ debuggerTime: ✅ 1.855ms (SLO: <2.000ms -7.2%) vs baseline: -0.4% Memory: ✅ 45.436MB (SLO: <47.000MB -3.3%) vs baseline: +4.6% ✅ iast-getTime: ✅ 1.859ms (SLO: <2.000ms -7.0%) vs baseline: -0.1% Memory: ✅ 42.349MB (SLO: <49.000MB 📉 -13.6%) vs baseline: +4.9% ✅ profilerTime: ✅ 1.907ms (SLO: <2.100ms -9.2%) vs baseline: -0.3% Memory: ✅ 46.537MB (SLO: <47.000MB 🟡 -1.0%) vs baseline: +5.2% ✅ resource-renamingTime: ✅ 3.369ms (SLO: <3.650ms -7.7%) vs baseline: +0.2% Memory: ✅ 52.298MB (SLO: <53.500MB -2.2%) vs baseline: +5.0% ✅ tracerTime: ✅ 3.359ms (SLO: <3.650ms -8.0%) vs baseline: -0.2% Memory: ✅ 52.298MB (SLO: <53.500MB -2.2%) vs baseline: +4.9% ✅ tracer-nativeTime: ✅ 3.349ms (SLO: <3.650ms -8.2%) vs baseline: -0.1% Memory: ✅ 58.262MB (SLO: <60.000MB -2.9%) vs baseline: +4.7% 🟡 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 2.910µs (SLO: <20.000µs 📉 -85.4%) vs baseline: -2.8% Memory: ✅ 32.145MB (SLO: <34.000MB -5.5%) vs baseline: +5.0% ✅ 1-count-metrics-100-timesTime: ✅ 202.421µs (SLO: <220.000µs -8.0%) vs baseline: -1.2% Memory: ✅ 32.106MB (SLO: <34.000MB -5.6%) vs baseline: +4.7% ✅ 1-distribution-metric-1-timesTime: ✅ 3.493µs (SLO: <20.000µs 📉 -82.5%) vs baseline: +5.0% Memory: ✅ 32.047MB (SLO: <34.000MB -5.7%) vs baseline: +4.6% ✅ 1-distribution-metrics-100-timesTime: ✅ 214.416µs (SLO: <220.000µs -2.5%) vs baseline: -0.7% Memory: ✅ 32.204MB (SLO: <34.000MB -5.3%) vs baseline: +5.4% ✅ 1-gauge-metric-1-timesTime: ✅ 2.165µs (SLO: <20.000µs 📉 -89.2%) vs baseline: -1.2% Memory: ✅ 32.145MB (SLO: <34.000MB -5.5%) vs baseline: +4.7% ✅ 1-gauge-metrics-100-timesTime: ✅ 136.500µs (SLO: <150.000µs -9.0%) vs baseline: -0.5% Memory: ✅ 32.165MB (SLO: <34.000MB -5.4%) vs baseline: +4.9% ✅ 1-rate-metric-1-timesTime: ✅ 3.055µs (SLO: <20.000µs 📉 -84.7%) vs baseline: -1.1% Memory: ✅ 32.106MB (SLO: <34.000MB -5.6%) vs baseline: +4.7% ✅ 1-rate-metrics-100-timesTime: ✅ 217.185µs (SLO: <250.000µs 📉 -13.1%) vs baseline: -0.4% Memory: ✅ 32.106MB (SLO: <34.000MB -5.6%) vs baseline: +4.6% ✅ 100-count-metrics-100-timesTime: ✅ 20.396ms (SLO: <22.000ms -7.3%) vs baseline: +0.5% Memory: ✅ 32.126MB (SLO: <34.000MB -5.5%) vs baseline: +4.7% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.282ms (SLO: <2.300ms 🟡 -0.8%) vs baseline: +0.2% Memory: ✅ 32.126MB (SLO: <34.000MB -5.5%) vs baseline: +5.0% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.420ms (SLO: <1.550ms -8.4%) vs baseline: +1.4% Memory: ✅ 32.145MB (SLO: <34.000MB -5.5%) vs baseline: +5.1% ✅ 100-rate-metrics-100-timesTime: ✅ 2.216ms (SLO: <2.550ms 📉 -13.1%) vs baseline: -0.3% Memory: ✅ 32.185MB (SLO: <34.000MB -5.3%) vs baseline: +5.0% ✅ flush-1-metricTime: ✅ 4.492µs (SLO: <20.000µs 📉 -77.5%) vs baseline: -0.1% Memory: ✅ 32.224MB (SLO: <34.000MB -5.2%) vs baseline: +5.0% ✅ flush-100-metricsTime: ✅ 174.244µs (SLO: <250.000µs 📉 -30.3%) vs baseline: -1.5% Memory: ✅ 32.106MB (SLO: <34.000MB -5.6%) vs baseline: +4.7% ✅ flush-1000-metricsTime: ✅ 2.106ms (SLO: <2.500ms 📉 -15.7%) vs baseline: -1.4% Memory: ✅ 32.932MB (SLO: <34.500MB -4.5%) vs baseline: +4.7%
|
…m marked (#14939) [MLOB-4230] ## Description This PR does 3 things: 1. (non-user facing) Updates our docker-compose and services.yml files to upgrade to the latest testagent version, as well as adding a env var `VCR_PROVIDER_MAP` value for the testagent configs. 2. (user-facing) fixes the langchain integration such that azure openai calls are not marked as duplicate LLM spans (if the openai integration is enabled), and instead marks them as generic workflow spans. 3. (non-user facing) Adds langchain tests for calling Azure OpenAI. These requires the testagent upgrade and the `VCR_PROVIDER_MAP` env var to allow the testagent vcr proxy to call the azure openai endpoint. We have logic in our langchain integration to mark specific LLM calls as generic workflow spans (instead of the default llm span) if we detect the corresponding integration (for the given provider, i.e. `openai/anthropic`) is also enabled and will result in a downstream LLM span. Our product experience breaks if multiple spans duplicate represent an LLM call, and we were previously missing support for azure openai. <!-- Provide an overview of the change and motivation for the change --> ## Testing <!-- Describe your testing strategy or note what tests are included --> ## Risks <!-- Note any risks associated with this change, or "None" if no risks --> ## Additional Notes <!-- Any other information that would be helpful for reviewers --> [MLOB-4230]: https://datadoghq.atlassian.net/browse/MLOB-4230?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ (cherry picked from commit 9f7d187)
…m marked (#14939) [MLOB-4230] This PR does 3 things: 1. (non-user facing) Updates our docker-compose and services.yml files to upgrade to the latest testagent version, as well as adding a env var `VCR_PROVIDER_MAP` value for the testagent configs. 2. (user-facing) fixes the langchain integration such that azure openai calls are not marked as duplicate LLM spans (if the openai integration is enabled), and instead marks them as generic workflow spans. 3. (non-user facing) Adds langchain tests for calling Azure OpenAI. These requires the testagent upgrade and the `VCR_PROVIDER_MAP` env var to allow the testagent vcr proxy to call the azure openai endpoint. We have logic in our langchain integration to mark specific LLM calls as generic workflow spans (instead of the default llm span) if we detect the corresponding integration (for the given provider, i.e. `openai/anthropic`) is also enabled and will result in a downstream LLM span. Our product experience breaks if multiple spans duplicate represent an LLM call, and we were previously missing support for azure openai. <!-- Provide an overview of the change and motivation for the change --> <!-- Describe your testing strategy or note what tests are included --> <!-- Note any risks associated with this change, or "None" if no risks --> <!-- Any other information that would be helpful for reviewers --> [MLOB-4230]: https://datadoghq.atlassian.net/browse/MLOB-4230?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ (cherry picked from commit 9f7d187)
…m marked (#14939) [MLOB-4230] ## Description This PR does 3 things: 1. (non-user facing) Updates our docker-compose and services.yml files to upgrade to the latest testagent version, as well as adding a env var `VCR_PROVIDER_MAP` value for the testagent configs. 2. (user-facing) fixes the langchain integration such that azure openai calls are not marked as duplicate LLM spans (if the openai integration is enabled), and instead marks them as generic workflow spans. 3. (non-user facing) Adds langchain tests for calling Azure OpenAI. These requires the testagent upgrade and the `VCR_PROVIDER_MAP` env var to allow the testagent vcr proxy to call the azure openai endpoint. We have logic in our langchain integration to mark specific LLM calls as generic workflow spans (instead of the default llm span) if we detect the corresponding integration (for the given provider, i.e. `openai/anthropic`) is also enabled and will result in a downstream LLM span. Our product experience breaks if multiple spans duplicate represent an LLM call, and we were previously missing support for azure openai. <!-- Provide an overview of the change and motivation for the change --> ## Testing <!-- Describe your testing strategy or note what tests are included --> ## Risks <!-- Note any risks associated with this change, or "None" if no risks --> ## Additional Notes <!-- Any other information that would be helpful for reviewers --> [MLOB-4230]: https://datadoghq.atlassian.net/browse/MLOB-4230?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ (cherry picked from commit 9f7d187)
…m marked (#14939) [MLOB-4230] This PR does 3 things: 1. (non-user facing) Updates our docker-compose and services.yml files to upgrade to the latest testagent version, as well as adding a env var `VCR_PROVIDER_MAP` value for the testagent configs. 2. (user-facing) fixes the langchain integration such that azure openai calls are not marked as duplicate LLM spans (if the openai integration is enabled), and instead marks them as generic workflow spans. 3. (non-user facing) Adds langchain tests for calling Azure OpenAI. These requires the testagent upgrade and the `VCR_PROVIDER_MAP` env var to allow the testagent vcr proxy to call the azure openai endpoint. We have logic in our langchain integration to mark specific LLM calls as generic workflow spans (instead of the default llm span) if we detect the corresponding integration (for the given provider, i.e. `openai/anthropic`) is also enabled and will result in a downstream LLM span. Our product experience breaks if multiple spans duplicate represent an LLM call, and we were previously missing support for azure openai. <!-- Provide an overview of the change and motivation for the change --> <!-- Describe your testing strategy or note what tests are included --> <!-- Note any risks associated with this change, or "None" if no risks --> <!-- Any other information that would be helpful for reviewers --> [MLOB-4230]: https://datadoghq.atlassian.net/browse/MLOB-4230?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ (cherry picked from commit 9f7d187)
MLOB-4230
Description
This PR does 3 things:
VCR_PROVIDER_MAPvalue for the testagent configs.VCR_PROVIDER_MAPenv var to allow the testagent vcr proxy to call the azure openai endpoint.We have logic in our langchain integration to mark specific LLM calls as generic workflow spans (instead of the default llm span) if we detect the corresponding integration (for the given provider, i.e.
openai/anthropic) is also enabled and will result in a downstream LLM span. Our product experience breaks if multiple spans duplicate represent an LLM call, and we were previously missing support for azure openai.Testing
Risks
Additional Notes